Skip to content

Conversation

WaffleLapkin
Copy link
Member

This PR adds the following functions for pointers to str, similar to already existing functions for pointers to [T].

impl *const str {
    pub const fn len(self) -> usize;
    pub const fn as_ptr(self) -> *const u8;
    pub unsafe fn get_unchecked<I>(self, index: I) -> *const I::Output
    where
        I: SliceIndex<str>;
}

impl *mut str {
    pub const fn len(self) -> usize;
    pub const fn as_mut_ptr(self) -> *mut u8;
    pub unsafe fn get_unchecked_mut<I>(self, index: I) -> *mut I::Output
    where
        I: SliceIndex<str>;
}

// mod ptr
pub const fn str_from_raw_parts(data: *const u8, len: usize) -> *const str;
pub const fn str_from_raw_parts_mut(data: *mut u8, len: usize) -> *mut str;

impl NonNull<str> {
    pub const fn str_from_raw_parts(data: NonNull<u8>, len: usize) -> Self;
    pub const fn len(self) -> usize;
    pub const fn as_non_null_ptr(self) -> NonNull<u8>;
    pub const fn as_mut_ptr(self) -> *mut u8;
    pub unsafe fn get_unchecked_mut<I>(self, index: I) -> NonNull<I::Output>
    where
        I: SliceIndex<str>;
}

I wasn't sure about feature gates and how to group them, so the advice is very appreciated.


This PR also adds const_str_ptr and mut_str_ptr lang items (they are required for *const str and *mut str impls).


See also:

@rust-highfive
Copy link
Contributor

r? @m-ou-se

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 29, 2021
@jfrimmel
Copy link
Contributor

I'm unsure about ptr.as_ptr(), as we already have a pointer. Maybe use as_bytes() instead?

@WaffleLapkin
Copy link
Member Author

@jfager I would expect that as_bytes returns some sort of byte slice, e.g. it's *const str -> *const [u8], I don't think it's a good name for a *const str -> *const u8 function. Maybe as_ptr isn't a great name either, but it's consistent with <*const [T]>::as_ptr.

@jfager
Copy link
Contributor

jfager commented May 30, 2021

@WaffleLapkin looks like you tagged the wrong person!

@WaffleLapkin
Copy link
Member Author

WaffleLapkin commented May 30, 2021

Oops, right. I've meant to tag @jfrimmel 😅

@m-ou-se
Copy link
Member

m-ou-se commented May 31, 2021

The len functions look fine to me. For those, you can just use the same tracking issue as for [T]: #71146

The as_[mut_]ptr and get_unchecked ones also look fine. Feel free to just re-use #74265.

But I'm not sure about the str_from_raw_parts, because we don't have a &str variant (yet?). Right now, you'd have to use slice::from_raw_parts and str::from_utf8_unchecked to make a &str from raw parts. If we have a str_from_raw_parts for *const str, I'm worried that might be confusing, since users searching for a &str will only find the *const str version. Is this ptr::str_from_raw_parts that useful? Should we leave it out? Or maybe we need one for &str too? (Part of the reason it's good it needs two steps, is that it makes it clear there's two unsafe assumptions: One about the pointer+len being valid, and one about the UTF-8 encoding being valid.)

@m-ou-se m-ou-se added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 31, 2021
@WaffleLapkin
Copy link
Member Author

@m-ou-se my original motivation for ptr::str_from_raw_parts was that it seemed weird to have methods on *const str without a convenient way to create such pointers. &str on the other hand can be easily created, e.g. str::from_utf8(slice::from_raw_parts(ptr, len))?. Though for symmetry and convenience str::from_raw_parts seems nice.

Part of the reason it's good it needs two steps, is that it makes it clear there's two unsafe assumptions: One about the pointer+len being valid, and one about the UTF-8 encoding being valid.

Then maybe instead of ptr::str_from_raw_parts we could have str::from_raw_utf8(*const [u8]) -> *const str.


Not sure which way is better.

@joelpalmer joelpalmer added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jun 15, 2021
@crlf0710
Copy link
Member

crlf0710 commented Jul 4, 2021

@WaffleLapkin Ping from triage, what's next steps here?

@m-ou-se
Copy link
Member

m-ou-se commented Jul 5, 2021

Then maybe instead of ptr::str_from_raw_parts we could have str::from_raw_utf8(*const [u8]) -> *const str.

Yeah maybe. I'd still put that in ptr:: instead of str:: though. Most users of str aren't interested in raw pointers, so it's probably good to keep all the raw pointer functions inside the ptr module.

cc @rust-lang/libs-api Any thoughts on these functions?

@dtolnay
Copy link
Member

dtolnay commented Jul 5, 2021

I don't see why the safety requirement of get_unchecked involves utf8 boundaries. I would expect only the same safety requirements as https://doc.rust-lang.org/std/primitive.pointer.html#method.add.

Please document safety requirements in a # Safety section.

Can we have safe indexing that works like https://doc.rust-lang.org/std/primitive.pointer.html#method.wrapping_add ?

@WaffleLapkin
Copy link
Member Author

I don't see why the safety requirement of get_unchecked involves utf8 boundaries. I would expect only the same safety requirements as https://doc.rust-lang.org/std/primitive.pointer.html#method.add.

Hm. So *const str doesn't have utf8 guarantees, and it's only unsafe to create non-utf8 &str?

@WaffleLapkin
Copy link
Member Author

So, to summarize the state of this PR. ​

  1. len, as_ptr, as_mut_ptr, as_non_null_ptr and get_unchecked are considered ok, I just need to fill tracking issues.
  2. str_from_raw_parts{_mut} is more controversial, since it may be confusing to have a function *const u8, usize -> *const str, but not the -> &str. There are a couple ways to 'fix' this, e.g.:
    • Add a ptr::str_from_utf8(*const [u8]) -> *const str function instead. This way we reuse slice ptr functions (like ptr::slice_from_raw_parts) and split safety into 2 parts: ptr validity and utf8 validity.
    • Add a -> &str counterpart too, like str::from_raw_parts (similar to slice::from_raw_parts)

I'm waiting for the decision on str_from_raw_parts{_mut} to fix everything and one go.

@bors
Copy link
Collaborator

bors commented Jul 28, 2021

☔ The latest upstream changes (presumably #85769) made this pull request unmergeable. Please resolve the merge conflicts.

@m-ou-se m-ou-se added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jul 28, 2021
@JohnCSimon
Copy link
Member

Triage: Still has conflicts

@bors
Copy link
Collaborator

bors commented Aug 18, 2021

☔ The latest upstream changes (presumably #86808) made this pull request unmergeable. Please resolve the merge conflicts.

@bors
Copy link
Collaborator

bors commented Aug 25, 2021

☔ The latest upstream changes (presumably #87875) made this pull request unmergeable. Please resolve the merge conflicts.

The functions are under feature gate `str_from_raw_parts` and are similar to
`slice_from_raw_parts`, `slice_from_raw_parts_mut`.
@rust-log-analyzer

This comment has been minimized.

These items allow to make inherent impls for `*const str` and `*mut str`.
This patch adds the following methods to `*const str` and `*mut str`:
- `len`
- `as_ptr` (`as_mut_ptr`)
- `get_unchecked` (`get_unchecked_mut`)

Similar methods have already existed for raw slices.
This patch adds the following methods to `NonNull<str>`:
- `str_from_raw_parts`
- `len`
- `as_non_null_ptr`
- `as_mut_ptr`
- `get_unchecked_mut`

Similar methods have already existed for raw slices, raw strings and nonnull
raw strings.
@JohnCSimon JohnCSimon added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 20, 2021
@camelid camelid added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 8, 2021
@JohnCSimon JohnCSimon added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 24, 2021
@JohnCSimon JohnCSimon added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 16, 2021
@bors
Copy link
Collaborator

bors commented Nov 25, 2021

☔ The latest upstream changes (presumably #91203) made this pull request unmergeable. Please resolve the merge conflicts.

@WaffleLapkin
Copy link
Member Author

Closing in favour of #91216 and #91218

@WaffleLapkin WaffleLapkin deleted the str_ptr_len_get branch December 21, 2021 00:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.